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Abstract. Computing the first few singular vectors of a large matrix is a 
problem that frequently comes up in statistics and numerical analysis. Given 
the presence of noise, exact calculation is hard to achieve, and the following 
problem is of importance: 

How much a small perturbation to the matrix changes the singular vectors ? 

Answering this question, classical theorems, such as those of Davis-Kahan 
and Wedin, give tight estimates for the worst-case scenario. In this paper, we 
show that if the perturbation (noise) is random and our matrix has low rank, 
then better estimates can be obtained. Our method relies on high dimensional 
geometry and is different from those used an earlier papers. 



MSC indices: 65F15, 15A42, 62H30 



1. Introduction 

An important problem that appears in various areas of applied mathematics (in 
particular statistics, computer science and numerical analysis) is to compute the 
first few singular vectors of a large matrix. Among others, this problem lies at 
the heart of PCA (Principal Component Analysis), which has a very wide range of 
applications (for many examples, see [21 [S] and the references therein). 

The basic setting of the problem is as follows: 

Problem 1. Given a matrix A of size nxn with singular values fXi > • • • > ct„ > 0. 
Let vi, . . . ,Vn he the corresponding (unit) singular vectors. Compute Vi, . . . ,Vk, for 
some k < n. 



Typically n is large and k is relatively small. As a matter of fact, in many ap- 
plications fc is a constant independent of n. For example, to obtain a visualization 
of a large set of data, one often sets fc = 2 or 3. The assumption that A is a 
square matrix is for convenience and our analysis can be carried out with nominal 
modification for rectangular matrices. 

We use asymptotic notation such as Q,Cl,0 under the assumption that n — >■ oo. 
The vectors vi,...,Vk are not unique. However, if ai,...,ak are different, then 
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they are determined up to the sign. We assume this is the case in ah discussions. 
(In fact, as the reader wih see, the gap Si := di — (Ti+i plays a crucial role.) For 
a vector v, \\v\\ denotes its L2 norm. For a matrix A, \\A\\ ~ cri(^) denotes its 
spectral norm. 

1.1. Classical perturbation bounds. The matrix A, which represents some sort 
of data, is often perturbed by noise. Thus, one typically works with A-\-E, where E 
represents the noise. A natural and important problem is to estimate the influence 
of noise on the vectors wi , . . . , . We denote by , . . . , uj, the first k singular 
vectors oi A + E. 

For sake of presentation, we restrict ourselves to the case fc = 1 (the first singular 
vector). Our analysis extends easily in the general case, discussed in Section [5l 

The following question is of importance 
Question 2. When is v[ a good approximation of Vi ? 

A convenient way to measure the distance between two unit vectors v and v' is to 
look at sin Z{v, v'), where Z{v, v') is the angle between the vectors, taken in [0, tt/2]. 
To make the problem more quantitative, let us fix a small parameter e > 0, which 
represents a desired accuracy. Our question now is to find a sufficient condition for 
the matrix A which guarantees that sinZ(wi,?;5^) < e. It has turned out that the 
key parameter to look at is the gap (or separation) 

S := (7l - (T2, 

between the first and second singular values of A. Classical results in numerical 
linear algebra yield 

Corollary 3. For any given e > 0, there is C — C{e) > such that if S > C\\E\\, 
then 

sin /l{vi , v[) < e. 

This follows from a well known result of Wedin 
Theorem 4. (Wedin sin theorem) There is a positive constant C such that 



(1) sinZ{v,,v[)<C'-^. 

In the case when A and A + E are hermitian, this statement is a special case of 
the famous Davis-Kahan sin 9 theorem. Wedin j7j extended Davis-Kahan theorem 
to non- hermitian matrices, resulting in a general theorem that contains Theorem [3] 
as a special case (see [51 Chapter 8] for more discussion and history). 
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Let us consider the following simple, but illustrative example [2j- Let A be the 
matrix 



Apparently, the singular values of A are 1 + e and 1 — e, with corresponding singular 
vectors (1, 0) and (0, 1). Let E be 



where e is a small positive number. The perturbed matrix A + E has the form 



Obviously, the singular values A + E are also 1 + e and 1 — e. However, the 
corresponding singular vectors now are ( i ) and ( j ~ ) ; ^'^ matter how 
small e is. This example shows that the consideration of the gap S is necessary, and 
also that Theorem [4] is sharp, up to a constant factor. 

1.2. Random perturbation. Noise (or perturbation) represents errors that come 
from various sources which are frequently of entirely different nature, such as errors 
occurring in measurements, errors occurring in recording and transmitting data, 
errors occurring by rounding etc. It is usually too complicated to model noise 
deterministically, so in practice, one often assumes that it is random. In particular, 
a popular model is that the entries of E are independent random variables with 
mean and variance 1 (the value 1 is, of course, just matter of normalization). 

For simplicity, we restrict ourselves to a representative case when all entries of E 
are iid Bernoulli random variables, taking values ±1 with probability half. For the 
treatment of more general models, see Section [S] 

Remark 5. We prefer the Bernoulli model over the gaussian one for two reasons. 
First, we believe that in many real-life applications, noise must have discrete nature 
(after all, data are finite). So it seems reasonable to use random variables with 
discrete support to model noise, and Bernoulli is the simplest such a variable. 
Second, as the reader will see, the analysis for the Bernoulli model easily extends 
to many other models of random matrices (including the gaussian one). On the 
other hand, the analysis for gaussian matrices often relies on special properties of 
the gaussian measure which are not available in other cases. 

We say that an event £ holds almost surely if P(f ) = 1 — o(l); in other words, the 
probability that £ holds tends to one as n tends to infinity. It is well-known that 
the norm of a random Bernoulli matrix is of order ^/n, almost surely (see Lemma 
[T2|) . Thus, Theorem H] implies the following variant of Corollary [3l 
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Corollary 6. For any given £ > 0, there is C = C{e) > such that if 6 > C^/n, 
then with probability 1 — o(l) 



smZ{vi,v[) < e. 

1.3. Low dimensional data and improved bounds. In a large variety of prob- 
lems, the data is of small dimension, namely, r :— rank A <Si n. The main point 
that we would like to make in this paper is that in this setting, the lower bound 
on S can be significantly improved. Let us first present the following (improved) 
variant of Corollary [SI 

Corollary 7. For any positive constant e there is a positive constant C = C(e) 
such that the followinq holds. Assume that A has rank r < and , " < a^ 

■' — Vrlogn — 

and S > C\/r log n. Then with probability 1 — oil) 



(2) smZ{vi,v[) < €. 

This result shows that (under the given circumstances) we can approximate vi 
closely (by v[) provided d > Cy/rlogn, improving the previous assumption S > 
C^/n. Furthermore, the appearance of a\ in the statement is necessary. If a\ <^ y/n, 
then the noise dominates and we could not expect to recover any good information 
about A from A + E. 

Corollary [7] is an easy consequence of the following theorem. 

Theorem 8. (Probabilistic sin-theorem) For any positive constants ai,a2 there 
is a positive constant C such that the following holds. Assume that A has rank 
r < and ai := o'i{A) < n"^ . Let E be a random Bernoulli matrix. Then 

with probabilty 1 — o(l) 



/ON -2// M / f y/r log n n ^\ 
(3) sm Z(wi,wi) < Cmax ,- — , . 

Furthermore, one can remove the term — if S < ^a-i . 

Let us know consider the general case when we try to approximate the first k 
singular vectors. Set ■— sinZ(t;fc, wj,) and Sk := (ef + • • • +£fc)^^^- We can bound 
Ek recursively as follows. 

Theorem 9. For any positive constants ai,a2,k there is a positive constant C such 
that the following holds. Assume that A has rankr < v}^"^^ andai := cri(A) < n"^. 
Let E be a random Bernoulli matrix. Then with probabilty 1 — o(l) 
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(4) e^<Cniax(- 



\J'r log n n ^ <^\Sk-\ (^i + V")((Tfe + V^)**;-] 



' <^k5k' cTfc ' UkSk ' (Jk5k 



The first three terms in the RHS of @ mirror those in ([3]). The last two terms 
represent the recursive effect. 

To give the reader a feehng about this bound, let us consider the following example. 
Take A such that r = ui = 2n", a2 = n", = n*^, where a> l/2> {3 > l-a 

are positive constants. Then 5i — n" and e\ < max |^?t.^"+°(^\ j^i-2q!+o(i)^ ^ almost 
surely. 

Assume that we want to bound sin Z(w2, The gap S2 ~ — o{n}^'^), so 
Wedin theorem (in the general form) does not apply. On the other hand, Theorem 
[niimphes that almost surely 



el < max (n-^+°(i), n^/^-^+^d) , n'^-'^+i) . 
Thus, we have almost surely 

sinZ(u2,i;2) n-"(i) = 0(1). 



The angle between two subspaces. Let us mention that if sinZ(wj, w^) < e for 
all 1 < j < fc, then sin Z(Vfc, V^') < e, where 14 (Vl) is the subspace spanned by 
vi, . . . ,Vk {v[, . . . ,v'f,, respectively). The formal (and a bit technical) definition of 
Z(Vfe, V^) can be seen in [SIH]. It is important to know that for two subspaces V, V 
of the same dimension 



sinZ(V, V) = \\P-P'\\ 

where P {P') denotes the orthogonal projection onto V {V'). Moreover \\P — P'\\ 
is frequently used as the distance between V and V . 

The rest of the paper is organized as follows. In the next section, we present 
tools from linear algebra and probability. The proofs of Theorems [8] and [9] follow 
in Sections [3] and [U respectively. In Section [SJ we extend these theorems for other 
models of random noise, including the gaussian one, and also to matrices A which 
do not necessarily have low rank. 
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2. Preliminaries: Linear Algebra and Probability 

2.1. Linear Algebra. Fix a system vi,. . . ,Vn of unit singular vectors of A. It is 
well-known that Vi,. . . ,Vn form an orthonormal basis. (If A has rank r, the choice 
of Vr+i, ■ ■ ■ ,Vn will tum out to be irrelevant.) 

For a vector v, if we decompose it as 



V := aivi H h a„Vn, 

then 



(5) \\Avf = vA*Av = J2^^^f- 

i=l 

We will use the Courant-Fisher minimax principle for singular values, which asserts 
that 



(6) <7fc(M)= max min \\Mv\\, 

dim H=kveH,\\v\\ = l 

where ak{M) is the fcth largest singular value of M. 



2.2. e-net lemma. Let e be a positive number. A set X is an e-net of a set Y if 
for any y GY, there is x £ X such that ||a; — y\\ < e. 

Lemma 10. Let H be a suhspace and S — l,v € H}. Let < e < 1 be 

a number and M a linear map. Let J\f C S be an e-net N of S. Then there is a 
vector w G J\f such that 



\\Mw\\ > (1 - e) max ||Mu|| 



Proof. Let v be the vector where the maximum is attained and let w be a vector in 
the net closest to v (tights are broken arbitrarily). Then by the triangle inequality 



||M«;|| > ||Mi;|| - ||M(u-«;)||. 

As \\v — w\\ < e, \\M{v — w)\\ < emaxy^ygs concluding the proof. □ 

The following estimate for the minimum size of an e of a sphere is well-known. 
Lemma 11. A unit sphere in d dimension admits an e-net of size at most (Se"^)"^. 
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Proof. Let S be the sphere in question, centered at O, and A/" C 5 be a finite subset 
of S such that the distance between any two points is at least e. If A/" is maximal 
with respect to this property then Af is an e-net. On the other hand, the balls of 
radius e/2 centered at the points in Af are disjoint subsets of the the ball of radius 
(1 +e/2), centered at O. Since 



< 

e/2 - 

the claim follows by a volume argument. □ 

2.3. Probability. We need the following estimate on ||i?|| (see [HE]). 

Lemma 12. There is a constant Cq > such that the following holds. Let E be a 
random Bernoulli matrix of size n. Then 



P{\\E\\ < 3V^) < exp(-Con). 

Next, we present a lemma which roughly asserts that for any two vectors given 
u and V, u and Ev are, with high probability, almost orthogonal. We present the 
proof of this lemma in ??. 

Lemma 13. Let E be a random Bernoulli matrix of size n. For any fixed unit 
vectors u, v and positive number t 



P(|u^£;u| >t)< 2exp(-tVl6). 



Now we are ready to state our key lemma. 

Lemma 14. For any constants < < < 1 there is a constant C such 
that the following holds. Assume that A is such that cti < n^'^ and let V :— 
Spanjwi, . . . , for some d > n^^^^ . Then the following holds almost surely. For 
any unit vector v ^ V 



\\{A + E)vf < ■ Vifa^ + C{n + cn^dlogn). 

Proof. It suffices to prove for v belonging to an e-net Af of the unit sphere S in 
V, with e :— ;jq^- With such small e, the error coming from the term (1 — e) (in 
Lemma [TU)) is swallowed into the error term 0{n + cri-y/dlog n). 

By Lemma [TOl \Af\ < (-)'* < exp(Cidlogri), for some constant Ci (which depends 
on the exponent /?i in the upper bound of cti). Thus, using the union bound, it 
suffices to show that if C is large enough, then for any v £ M 
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P(||(A + E)vf > ^(w • + C{n + (Jiv/dlogn)) < exp(-2Cidlog n) 
1=1 

for any fixed v e Af. 
Fix veAf. By (P, 

n 

+ = \\Av\\^ + \\Evf + 2{Av)-{Ev) = ■^^f'^f + \\Evf + 2{Av) ■ (Ev). 

i=l 

Since \\Av\\ < ai, we have, by Lemma [T3l that with probability at least 1 — 
exp(— C2dlogn) 

\{Av) ■ {Ev)\ < Cai^dlogn, 

where C2 increases with C. Thus, by choosing C sufficiently large, we can assume 
that C2 > 3Ci. 

Furthermore, by LemmafT^ ll-E'^ll ^ with probability at least 1 — exp(— r2(n)). 
Combining this with the above bounds, we conclude that for a sufficiently large 
constant C 



n 

'P{\\{A + E)v\\^ >Y^{v-v,f +C{n + (Ji)) < exp(-3Cidlogn) +exp(-n(n)) 
1=1 

< exp(— 2Ci(ilogn), 
completing the proof. □ 

3. Proof of Theorem [5] 

Let H be the subspace spanned by {ui,U2} and Ui{l < i < n) be the singular 
vectors of the matrix A* . 

First, we give a lower bound for a[ :— \\A + E\\. By the minimax principle, we 
have 

a[ = \\A + E\\> \u^{A + E)vi\ = \ai+u'^Evi\. 

By Lemma [T51 we have, with probability 1 — o(l), |uf < log log n. (The choice 
of log log n is not important. One can replace it by any function that tends slowly 
to infinity with n.) 

Thus, we have, with probability 1 — o(l), that 
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(7) \\A + E\\><Ji-\og\ogn. 

Our main observation is that, with high probabihty, any v that is far from vi 
would yield \\{A + E)v\\ < (Ti — log log n. Therefore, the first singular vector v'l of 
A + E must be close to vi . 

Consider a unit vector v and write it as 

V = ClVi + C2V2 + • • • + CrVr + CqU 

where m is a unit vector orthogonal to H :~ Span{ wi ,...,«,.} and Ci + ■ ■ ■ + c^ + c^ = 
1. Recall that r is the rank of A, so Au = 0. Setting w :— ciVi + • • • + c^fr and 
using Cauchy-Schwartz, we have 

\\{A + E)vf = \\iA + E)w + c^EuW^ <\\{A + E)w\\^ + 2co|l(A + E)w\\ \\Eu\\ + cl\\Eu\\^ 
< {l + ^)\\iA + E)wr + ii + cl)\\Eur. 

By Lemma [T^ we have, with probability 1 — o(l), that < S-y/n) for every 

unit vector u. Furthermore, by Lemma [T4l we have, with probability 1 — o(l), 

r 

\\{A + E)w\\^ < ^(w -w,)^ +0(CTiv/?'logn + n) 

i=l 

for every vector w Cz H of length at most 1 . 
Since 

j^iw ■ v,f = j2 < (1 - - (1 - - - <^l\ 

1=1 1=1 

we can conclude that with probability 1 — o(l), the first singular vector of A + E, 
written in the form v ^ ciVi + ■ ■ ■ + CrVr + cqu, satisfies 



TT7277ll(^ + ^)Hl' < {l~cl)al-il-cl-cl)ial-al) + OiaiVrlogn + ,2). 

1 -f- Cq/ 1 

Notice that cq < 1, so the term cqu is swallowed into 0{n). By ([7]) and the fact 
1 

that TTT > 1 — we have 
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1^^\\{A + E)vr > (1 - ^)(<^i - loglogn)^ 



Comparing this with ([8]) and noticing that both ailoglogn and (log log n)^ are 
o((Ti\/rlogn), we obtain, for some properly chosen constant C, that 



{l-c\)ai5- -^al < -c^al + C[ai log n + n) . 

Before concluding the proof, let us derive a bound on cq. We can show that with 
probability 1 — o(l) 



(9) cl = 0{^). 

CTi 

To verify this, we again used the bound ||(yl + £^)t;|| > cti — log log n. Oh the other 
hand, by the triangle inequality and Lemma [T2| we have with probability 1 — o(l) 



\\{A^E)v\\ < \\Av\\ + \\Ev\\ < - cVi + 3v^, 
from which ([9]) follows by a simple computation. 

Without loss of generality, we can assume that C > 1. If (72 < ^CTi, then 6 > ^ai 
and 



(10) 1 - c? < + ^ ^ ^ o(^Hogn) ^ ^(4) + O(^). 

(Jl/2 2 CTi (7]^ CTi 



In the case a2 > ^cri, CqG^ > ^crf, so 



(1 - cl)ai6 < C{ai y/r log n + n) 

which implies 



2, ^ ^^y/rlogn n 



(11) (l-cf)<C(^^-^ + — ). 

(Tid 

Notice that sin Z{vi,v[)'^ = sinZ(wi,w)^ — 1 — c^. The desired claim follows from 
(Uni) and (HH). 

Remark 15. One can improve the error term to i^)^^"^- However, this proof 
is more technical and harder to generalize. 
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4. Proof of Theorem [9] 

Similar to the previous proof, we start with a lower bound for cr^, the fcth largest 
singular value of A + E. Using the minimax principle, we have 

(T'k>\ul{A + E)vk\>ak- log log n 
with probability 1 — o(l). 

We need to consider || {A + E)v\\ for a unit vector v orthogonal to v'l, . . . , 
We write (as before) 

V := CiVi + • • • + CrVr + CqU = W + CqU. 

If V is the fcth singular vector oi A + E, then v ■ v'j ~ ioi I < j < k — 1, and we 
obtain 

Id = \v.v,\ = \v (v, - v'^)\ < \v, -v'^\< 2sinZ(i;,.«;) = 2e,. 
As in the previous proof, we consider the inequality 

\\{A + E)v\\^ = \\{A + E)w + coEu\\^ < \\{A + E)wf + 2co\\{A + E)w\\\\Eu\\ + cl\\Eu\\^ 

< {i + ^mA+E)wr + ii+ci)\\Eur. 

We split w = Wk + Wk, where Wk ■= CiVi + ■ ■ ■ + Ck-iVk-i and Wk '■= c^Vk + . . . VrCr- 
We have 

\\{A+E)w\\'' = \\{A+E){wk+Wk)\\'' < \\iA+E)wk\f+\\{A+E)wkf+2\\{A+E)wk\\\\{A+E)wk\ 
Using Lemma UM we have 

(12) \\{A + E)wk\f < clvl + ■ ■ ■ + clv^ + 0{ak\/rlogn + n). 
The term \\{A + E)wk\\'^ can be bounded, rather generously, by 

(13) 0((ai + ^f{cl + ---+ cl_,) = 0{ol + n)sl_,). 
Moreover, 
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(14) 

\\{A+E)wk\\\\iA+E)wk\\ = 0{{(7k+V^){(Ti+V^)\\wk\\\\barwk\\ = 0{{(Ti+y/7i){(7k+V^)sk^i. 
Repeating the calculations in the previous proof, we have, with probability 1 — o(l) 



(l-cD(^fc-^fe+i)- J^fc < c'(CT|-a^+i)-c^a^+i+0(afcv/^bi^+n)+0(a24_i + (ai+V^)((7fe+V^)sfc_i). 

j=i 

We can bound cq as follows 



(15) cg + 4-i=0(^ + ^). 

By considering the two cases ak+i > ^(Jk and crt+i < ^ffci the desired bound 
follows. 



5. Extensions 

In this section, we extend our results to other models of random matrices. It is 
easy to see that we did not rely ver heavily on properties of the Bernoulli random 
variable. All we need is a model of random matrices so that Lemmas fT2l and [T3l (or 
sufficiently strong variants) hold. 

Both of these lemmas hold for the case where the noise is gaussian (instead of 
Bernoulli). In fact. Lemma [T3l is trivial as u^Ev has distribution iV(0, 1). 

Both lemmas hold in the case the entries of E is bounded by a universal constant 
K. For the proof of Lemma [T^ see [IJ[S]. For the proof of Lemma [T^ see Remark 

Quite often, the boundedness condition can be replaced by the condition of having 
a rapidly decaying tail (such as sub-gaussian) , using either more advanced concen- 
tration tools (see [5]) or a truncation argument (see [in])- We do not pursuit these 
matters here. 

We can also extend our results for a matrix A which does not have low rank, but 
can be well approximate by one. In this case, we consider A = A' + B, where A' 
has small rank (say r) and B is very small. In this case, we can apply, say. Theorem 
IS] to bound \\vi{A') - vi{A' + E)\\ and Theorem H to bound \\vi{A') - vi{A)\\ and 
then use the triangle inequality. As a result, the RHS of ([3]) will have an extra term 
The reader is invited to work out the details. 
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Finally, our analysis also extends fairly easily to the case when £^ is a hermitian 
random matrix (either Wigner or Wishart model) and A is hermitian. The details 
and few applications will appear elsewhere. 

Appendix A. Proof of Lemma [T3] 

As u^Ev = ■ UjVj^ij where u = {ui)2^i, v = and the are the entries 

of E, Lemma fTsl follows from 

Lemma 16. Let S := ci^i + • • • +c„^,i where are iid Bernoulli random variables 
and a are real numbers such as X]"=i = 1- Then for any number t > 

P{\S\ >t)< 2exp(-tVl6). 

Proof. Without loss of generality, we can assume that \ci\ decreases and I is the 
last index such that \ci\ > ^. As Y^^=i '^i = 1j ^ ^ t'^/A. By Cauchy- Schwartz, 



|ciei + --- + Q6l'<^'E^*'^7' 

1=1 

which implies that with probability one |ci^i + . . . q^;| < |. Therefore, 

Pi\S\>t)<P{\S'\<i), 

where S' := J27=i+i 

We can bound P(|5"| < |) by the standard Laplace-transform argument. Set 
z :— t/A. Thanks to independence, we have 



f 

P(^' > 2) = P(exp(z^') > e*"/2) < e"*^/2E(exp(z^')) = e"*"/' \{ Eexp(zc,ez). 

i=l + l 

On the other hand, as \zci\ < 1, it is easy to show that 

Eexp(zci^i) < 1 + (zc^)^ < exp(z^Cj). 

Together, we obtain 

" 4- j-2 

P(5' > tz/2) < e-'^^'eM E ^'^') ^ e^P(^' " y) = ^^P^"!^)' 

i=l+l 
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Similarly 



P(S" < -te/2) = P{-S' > tz/2) < exp(-— ), 

concluding the proof. □ 

Remark 17. The same proof works for ^ being arbitrary independent random 
variable with mean and variance 1, uniformly bounded by a constant K. In this 
case, the constant 16 is replaced by a constant depending on K. 
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